-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record original node selectors #660
Record original node selectors #660
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
Hi @trasc. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @alculquicondor /cc @mwielgus /cc @mimowo |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments around error handling though, and a question if there is a possibility of a malicious actor injecting unmarshalable node selectors. Either directly in the job annotation or via a recreated workload object.
6a6c502
to
f1ab6b8
Compare
// node selectors are recorded upon a workload admission. This information, | ||
// if present, will be used to restore them if a workload is deleted while | ||
// it is admitted. The content is a json marshaled slice of selectors. | ||
OriginalNodeSelectorsAnnotation = "kueue.x-k8s.io/original-node-selectors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we weighted the suggestion? #518 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it will unnecessary complicate the workload lifecycle. Also setting the annotation is done without any extra api calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation is also useful as a record for users to look at
f1ab6b8
to
56be3aa
Compare
4d500be
to
68bc891
Compare
/test pull-kueue-test-integration-main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more non-blocking nits. @alculquicondor over to you.
if err != nil { | ||
log.V(3).Error(err, "Unable to get original node selectors") | ||
} else { | ||
job.RestoreNodeAffinity(selectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could fallback into getting the selectors from the workload object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my original approach but was change during the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC in the original approach you first looked up the workload object and then fallback to annotation.
Anyway, I think that once we made the annotation immutable, we can fully rely on it, unless I'm missing some scenario.
If such a scenario exists my point was that we should add a comment why this is done (what is the scenario). Otherwise we will end up in a suspicious code which no-one remembers / knows why needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order was inverted due to performance reasons, getting the selectors from the workload wold not have needed additional un-marshaling.
The scenario wold be when a job is missing the annotation, could happen if the job (other than core.Job or MPIJob) in question is not blocking the change of the annotation while running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we have a webhook to prevent the change. So, the only scenario is that the webhook is malfunctioning for some reason and we have a bad actor.
However, if the webhook is malfunctioning and we have a bad actor, the actor could both modify the annotation and delete the workload so the fallback would not work either. So, iiuc, it would not be bullet proof either, just misleadingly making that impression.
Save and try to restore the original node selectors in/from a job annotation "kueue.x-k8s.io/original-selectors".
68bc891
to
1022e23
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, trasc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label tide/merge-method-squash |
aac2e55
to
776aefa
Compare
/unhold |
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Records the original node selectors in a annotation
kueue.x-k8s.io/original-selectors
, the content of this annotation is used to restore the "job's" node selectors content if a workload is not present when a the job is suspended (stopJob
), this being the case when a workload is deleted while it's admitted.Which issue(s) this PR fixes:
Fixes #518
Special notes for your reviewer: